Context specific protein function prediction.

نویسندگان

  • Naoki Nariai
  • Simon Kasif
چکیده

Although whole-genome sequencing of many organisms has been completed, numerous newly discovered genes are still functionally unknown. Using high-throughput data such as protein-protein interaction (PPI) information to assign putative protein function to the unknown genes has been proposed, since in many cases it is not feasible to annotate the newly discovered genes by sequence-based approaches alone. In addition to PPI data, information such as protein localization within a cell may be employed to improve protein function prediction in two ways: 1) By using such localization information as a direct indicator of protein function (e.g. nucleolus localized proteins might be involved in ribosome biogenesis), and 2) by refining noisy PPI data by localization information. In the latter case, localization information may be used to distinguish different types of PPIs: Namely, interactions between co-localized proteins (more reliable), and interactions between differently localized proteins (potentially less reliable). In this paper, we propose a probabilistic method to predict protein function from PPI data and localization information. A Bayesian network is used to model dependencies between protein function, PPI data and localization information. We showed in our cross-validation experiment that in some cases, our method (conditioning PPI data by localization information) significantly improves prediction precision, as compared to a simple Naive Bayes method that assumes PPI data and localization information are conditionally independent given protein function. Finally, we predicted 57 unknown genes as "ribosome biogenesis" proteins.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Review Prediction of protein function and pathways in the genome era

The growing number of completely sequenced genomes adds new dimensions to the use of sequence analysis to predict protein function. Compared with the classical knowledge transfer from one protein to a similar sequence (homology-based function prediction), knowledge about the corresponding genes in other genomes (orthology-based function prediction) provides more specific information about the p...

متن کامل

A Survey of Computational Methods for Protein Function Prediction

Rapid advances in high-throughout genome sequencing technologies have resulted in millions of protein-encoding gene sequences with no functional characterization. Automated protein function annotation or prediction is a prime problem for computational methods to tackle in the post-genomic era of big molecular data. While recent community-driven experiments demonstrate that the accuracy of funct...

متن کامل

Prediction of kinase-specific phosphorylation sites through an integrative model of protein context and sequence.

Identifying kinase substrates and the specific phosphorylation sites they regulate is an important factor in understanding protein function regulation and signalling pathways. Computational prediction of kinase targets - assigning kinases to putative substrates, and selecting from protein sequence the sites that kinases can phosphorylate - requires the consideration of both the cellular context...

متن کامل

Hierarchical Ensemble Methods for Protein Function Prediction

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relations...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome informatics. International Conference on Genome Informatics

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2007